-
Notifications
You must be signed in to change notification settings - Fork 586
perf(actions): lazy initialization of embedding indexes #1572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR successfully implements lazy initialization of embedding indexes to improve startup performance and reduce unnecessary FastEmbed downloads.
Key Changes
V1.0 (nemoguardrails/actions/llm/generation.py)
- Removed eager
init()call from__init__that was running in a separate thread - Added
_init_lock(asyncio.Lock) for thread-safe lazy initialization - Implemented three
_ensure_*_index()methods using double-checked locking pattern - Updated all action methods (
generate_user_intent,generate_next_step,generate_bot_message,generate_value,generate_intent_steps_message) to call ensure methods before index usage - Removed unnecessary imports:
threading,check_sync_call_from_async_loop,get_or_create_event_loop - Added early return condition in
_init_bot_message_index()when no user_messages exist
V2.x (nemoguardrails/actions/v2_x/generation.py)
- Implemented
_ensure_flows_index()and_ensure_instruction_flows_index()methods - Uses
hasattrchecks forinstruction_flows_indexsince it's created dynamically - Updated all methods that use flow indexes to call ensure methods first
Behavior Changes
- Input/output rails only: FastEmbed no longer loaded at initialization
- Passthrough mode: No embedding index initialization
- Dialog rails with user messages: FastEmbed loaded on first
generate()call - All existing functionality preserved, just deferred until actually needed
Test Coverage
- Comprehensive tests verify indexes are None after initialization
- Tests verify FastEmbed cache stays empty for simple configurations
- Tests verify proper lazy loading when dialog rails are used
- Concurrent initialization tests ensure thread-safety of double-checked locking
Confidence Score: 5/5
- Safe to merge - well-implemented lazy initialization with proper thread-safety and comprehensive test coverage.
- The implementation uses the standard double-checked locking pattern with asyncio.Lock correctly. All action methods that access embedding indexes have been updated with appropriate ensure*_index() calls. The code properly handles edge cases where indexes may remain None after initialization (e.g., empty message lists). Comprehensive test coverage validates both the lazy behavior and thread-safety. No breaking changes to existing functionality.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/actions/llm/generation.py | 5/5 | Implemented lazy initialization for embedding indexes with proper double-checked locking pattern. Removed eager init() call and unnecessary imports (threading, patch_asyncio). All action methods properly call ensure methods before index usage. |
| nemoguardrails/actions/v2_x/generation.py | 5/5 | Added lazy initialization for V2.x with _ensure_flows_index() and _ensure_instruction_flows_index() methods. Properly handles instruction_flows_index attribute that may not exist initially using hasattr checks. |
| tests/test_actions_llm_embedding_lazy_init.py | 5/5 | Comprehensive test coverage for V1.0 lazy initialization including verification that indexes are None at init, FastEmbed cache remains empty for simple configs, and indexes are initialized on first use. |
| tests/v2_x/test_llm_embedding_lazy_init.py | 5/5 | Test coverage for V2.x lazy initialization including passthrough mode and dialog configurations. Verifies indexes and instruction_flows_index remain uninitialized for simple configs. |
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant LLMGenerationActions
participant EmbeddingIndex
Note over LLMRails,LLMGenerationActions: Before PR: Eager Initialization
User->>LLMRails: __init__(config)
LLMRails->>LLMGenerationActions: __init__()
LLMGenerationActions->>LLMGenerationActions: init() in thread
LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded
LLMGenerationActions->>EmbeddingIndex: _init_bot_message_index()
LLMGenerationActions->>EmbeddingIndex: _init_flows_index()
LLMGenerationActions-->>LLMRails: Ready (all indexes loaded)
Note over LLMRails,LLMGenerationActions: After PR: Lazy Initialization
User->>LLMRails: __init__(config)
LLMRails->>LLMGenerationActions: __init__()
Note over LLMGenerationActions: No init() call!
LLMGenerationActions-->>LLMRails: Ready (no indexes loaded)
User->>LLMRails: generate(messages)
LLMRails->>LLMGenerationActions: generate_user_intent()
LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
alt user_message_index is None and user_messages exist
LLMGenerationActions->>LLMGenerationActions: acquire _init_lock
LLMGenerationActions->>LLMGenerationActions: check again if None
LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded (lazy)
end
LLMGenerationActions->>EmbeddingIndex: search(text)
EmbeddingIndex-->>LLMGenerationActions: results
LLMGenerationActions-->>LLMRails: intent
Note over User,EmbeddingIndex: Input/Output Rails Only: No Indexes Loaded
User->>LLMRails: __init__(config with input rails)
LLMRails->>LLMGenerationActions: __init__()
LLMGenerationActions-->>LLMRails: Ready (no indexes)
User->>LLMRails: generate(messages)
Note over LLMGenerationActions: No embedding indexes needed
LLMRails-->>User: response (FastEmbed never loaded)
Documentation preview |
aa30386 to
46e4df9
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@greptileai re-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR successfully implements lazy initialization of embedding indexes to improve startup performance and reduce unnecessary FastEmbed downloads.
Key Changes
V1.0 (nemoguardrails/actions/llm/generation.py)
- Removed eager
init()call from__init__that was running in a separate thread - Added
_init_lock(asyncio.Lock) for thread-safe lazy initialization - Implemented three
_ensure_*_index()methods using double-checked locking pattern - Updated all action methods (
generate_user_intent,generate_next_step,generate_bot_message,generate_value,generate_intent_steps_message) to call ensure methods before index usage - Removed unnecessary imports:
threading,check_sync_call_from_async_loop,get_or_create_event_loop - Added early return condition in
_init_bot_message_index()when no user_messages exist
V2.x (nemoguardrails/actions/v2_x/generation.py)
- Implemented
_ensure_flows_index()and_ensure_instruction_flows_index()methods - Uses
hasattrchecks forinstruction_flows_indexsince it's created dynamically - Updated all methods that use flow indexes to call ensure methods first
Behavior Changes
- Input/output rails only: FastEmbed no longer loaded at initialization
- Passthrough mode: No embedding index initialization
- Dialog rails with user messages: FastEmbed loaded on first
generate()call - All existing functionality preserved, just deferred until actually needed
Test Coverage
- Comprehensive tests verify indexes are None after initialization
- Tests verify FastEmbed cache stays empty for simple configurations
- Tests verify proper lazy loading when dialog rails are used
- Concurrent initialization tests ensure thread-safety of double-checked locking
Confidence Score: 5/5
- Safe to merge - well-implemented lazy initialization with proper thread-safety and comprehensive test coverage.
- The implementation uses the standard double-checked locking pattern with asyncio.Lock correctly. All action methods that access embedding indexes have been updated with appropriate ensure*_index() calls. The code properly handles edge cases where indexes may remain None after initialization (e.g., empty message lists). Comprehensive test coverage validates both the lazy behavior and thread-safety. No breaking changes to existing functionality.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/actions/llm/generation.py | 5/5 | Implemented lazy initialization for embedding indexes with proper double-checked locking pattern. Removed eager init() call and unnecessary imports (threading, patch_asyncio). All action methods properly call ensure methods before index usage. |
| nemoguardrails/actions/v2_x/generation.py | 5/5 | Added lazy initialization for V2.x with _ensure_flows_index() and _ensure_instruction_flows_index() methods. Properly handles instruction_flows_index attribute that may not exist initially using hasattr checks. |
| tests/test_actions_llm_embedding_lazy_init.py | 5/5 | Comprehensive test coverage for V1.0 lazy initialization including verification that indexes are None at init, FastEmbed cache remains empty for simple configs, and indexes are initialized on first use. |
| tests/v2_x/test_llm_embedding_lazy_init.py | 5/5 | Test coverage for V2.x lazy initialization including passthrough mode and dialog configurations. Verifies indexes and instruction_flows_index remain uninitialized for simple configs. |
Sequence Diagram
sequenceDiagram
participant User
participant LLMRails
participant LLMGenerationActions
participant EmbeddingIndex
Note over LLMRails,LLMGenerationActions: Before PR: Eager Initialization
User->>LLMRails: __init__(config)
LLMRails->>LLMGenerationActions: __init__()
LLMGenerationActions->>LLMGenerationActions: init() in thread
LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded
LLMGenerationActions->>EmbeddingIndex: _init_bot_message_index()
LLMGenerationActions->>EmbeddingIndex: _init_flows_index()
LLMGenerationActions-->>LLMRails: Ready (all indexes loaded)
Note over LLMRails,LLMGenerationActions: After PR: Lazy Initialization
User->>LLMRails: __init__(config)
LLMRails->>LLMGenerationActions: __init__()
Note over LLMGenerationActions: No init() call!
LLMGenerationActions-->>LLMRails: Ready (no indexes loaded)
User->>LLMRails: generate(messages)
LLMRails->>LLMGenerationActions: generate_user_intent()
LLMGenerationActions->>LLMGenerationActions: _ensure_user_message_index()
alt user_message_index is None and user_messages exist
LLMGenerationActions->>LLMGenerationActions: acquire _init_lock
LLMGenerationActions->>LLMGenerationActions: check again if None
LLMGenerationActions->>EmbeddingIndex: _init_user_message_index()
EmbeddingIndex-->>LLMGenerationActions: FastEmbed loaded (lazy)
end
LLMGenerationActions->>EmbeddingIndex: search(text)
EmbeddingIndex-->>LLMGenerationActions: results
LLMGenerationActions-->>LLMRails: intent
Note over User,EmbeddingIndex: Input/Output Rails Only: No Indexes Loaded
User->>LLMRails: __init__(config with input rails)
LLMRails->>LLMGenerationActions: __init__()
LLMGenerationActions-->>LLMRails: Ready (no indexes)
User->>LLMRails: generate(messages)
Note over LLMGenerationActions: No embedding indexes needed
LLMRails-->>User: response (FastEmbed never loaded)
Uses asyncio.Lock with double-checked locking pattern to prevent race conditions when multiple async tasks call _ensure_*_index() methods concurrently. Also fixes tests to use TestChat (with FakeLLM) instead of creating LLMRails directly, and adds skip conditions for fastembed tests.
46e4df9 to
ce6cd1e
Compare
trebedea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but we should also remove the init() function.
Description
Implements lazy initialization of embedding indexes (
user_message_index,bot_message_index,flows_index,instruction_flows_index) so FastEmbed is only loaded when semantic search is actually needed.Previously, embedding indexes were eagerly initialized at
LLMRailsconstruction time, causing FastEmbed models to be downloaded even for simple configurations that only use input/output rails or passthrough mode.Behavior by Configuration
self check input)self check output)generate()Implementation
init()call fromLLMGenerationActions.__init__()_ensure_*_index()helper methods for lazy initializationLLMGenerationActionsV2dotxTest Plan
tests/test_actions_llm_embedding_lazy_init.py)tests/v2_x/test_llm_embedding_lazy_init.py)Noneat initialization